
£ol0>/o5$>%682 

dT12Rec'dPCT/FT0 i 2 OCT 2004 



- 1 - 



DESCRIPTION 

ENCODING APPARATUS AND ENCODING METHOD, DECODING APPARATUS 
AND DECODING METHOD, RECORDING MEDIUM, AND PROGRAM 



Technical Field 

The present invention relates to an encoding apparatus 
and an encoding method, a decoding apparatus and a decoding 
method, a recording medium, and a program. The present 
invention relates to, for example, an encoding apparatus and 
an encoding method, a decoding apparatus and a decoding 
method, a recording medium, and a program suitable for 
encoding image signals with a higher compression ratio for 
transmission or accumulation. 

Background Art 

Nowadays, apparatuses in compliance with, for example, 
MPEG (Moving Picture Expert Group) , which is an image 
compression standard based on orthogonal transformation, 
such as discrete cosine transformation, and motion 
compensation where redundancies specific to image 
information are exploited to handle images as digital 
signals for efficient transmission and accumulation of such 
digital signals, are being widely used for both information 
distribution by broadcast stations and information reception 
in households. 
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In particular, the MPEG2 (ISO/IEC 13818-2) compression 
technique is a standard defined as a general-purpose image 
compression scheme, covering interlaced scan images and 
progressive scan images, as well as standard-resolution 
5 images and high-definition images. Thus, MPEG2 is widely 

used by both professionals and general consumers, as seen in, 
for example, the DVD (Digital Versatile Disk) standards. 

The use of the MPEG2 compression scheme accomplishes a 
high compression ratio and high image quality by assigning 

1Q bit rates of, for example, 4 to 8 Mbps for interlaced scan 
images with a standard resolution of 720x480 pixels and bit 
rates of, for example, 18 to 22 Mbps for interlaced scan 
images with a high resolution of 1920x1088 pixels. 

Since MPEG2 is mainly intended for a high quality 

15 encoding technique suitable for broadcasting, it does not 
support an encoding scheme for a higher compression ratio. 
This is the reason the MPEG4 encoding system has been 
standardized as an encoding scheme for a higher compression 
ratio. The image encoding scheme was approved as an 

20 international standard ISO/IEC 14496-2 in December 1998. 

Furthermore, the standardization of H.26L (ITU-T Q6/16 
VCEG) , originally intended for image encoding for video 
conferences, is being promoted by ITU-T (International 
Telecommunication Union-Telecommunication Standardization 

25 Sector) . 
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H.2 6L is known as a standard which achieves a higher 
encoding efficiency, though it requires a larger amount of 
arithmetic operation for encoding processing and decoding 
processing compared with known encoding schemes such as 
5 MPEG2 and MPEG4 . 

In addition, one of the current MPEG4 activities 
includes Joint Model of Enhanced-Compression Video Coding, 
being promoted jointly with ITU-T, for the standardization 
of an encoding scheme which achieves a higher encoding 

10 efficiency based on H.26L and employs functions not 
supported by H.2 6L. 

A known image information encoding apparatus based on 
orthogonal transformation, such as discrete cosine 
transformation or Karhunen-Loeve transform, and motion 

15 compensation will now be described with reference to Fig. 1. 
Fig. 1 shows an example structure of a known image 
information encoding apparatus. 

In the relevant image information encoding apparatus, 
an input image signal, as an analog signal, is converted to 

20 a digital signal by an A/D conversion section 1 and the 

digital signal is then passed to a picture sorting buffer 2. 
The picture sorting buffer 2 rearranges frames of the image 
information from the A/D conversion section 1 according to 
the GOP (Group of Pictures) structure of the image 

2 5 compression information output by the relevant image 
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information encoding apparatus. 

Images that are subjected to intra-encoding (encoding 
in an image) will first be described. In the picture 
sorting buffer 2, the image information of an image to be 
5 subjected to intra-encoding is passed to an orthogonal 
transformation section 4 via an adder 3. 

In the orthogonal transformation section 4, the image 
information is subjected to orthogonal transformation (e.g., 
discrete cosine transformation or Karhunen-Loeve transform) , 

10 and the obtained transform coefficient is passed to a 

quantization section 5. In the quantization section 5, the 
transform coefficient supplied from the orthogonal 
transformation section 4 is subjected to quantization 
processing under the control of a rate control section 8 

15 based on the amount of transform coefficient data 
accumulated in an accumulation buffer 7. 

In a lossless encoding section 6, an encoding mode is 
determined based on the quantized transform coefficient, 
quantization scale, etc. supplied from the quantization 

20 section 5, and the determined encoding mode is subjected to 
lossless encoding (e.g., variable-length encoding or 
arithmetic coding) to form information to be stored in the 
header of an image encoding unit. Furthermore, the encoded 
encoding mode is supplied to the accumulation buffer 7 for 

25 accumulation. The encoded encoding mode accumulated in the 
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accumulation buffer 7 is output to the subsequent stage as 
image compression information. 

In addition, in the lossless encoding section 6, the 
quantized transform coefficient is subjected to lossless 
5 encoding and the encoded transform coefficient is 

accumulated in the accumulation buffer 7. The encoded 
transform coefficient, accumulated in the accumulation 
buffer 7, is also output to the subsequent stage as image 
compression information . 

10 In a dequantization section 9, the transform 

coefficient quantized by the quantization section 5 is 
dequantized. In an inverse orthogonal transformation 
section 10, the dequantized transform coefficient is 
subjected to inverse orthogonal transformation processing 

15 and decoded image information is generated. The generated 
decoded image information is accumulated in a frame memory 
11. 

Images that are subjected to inter-encoding (encoding 
between images) will now be described. In the picture 
20 sorting buffer 2, the image information of an image to be 

subjected to inter-encoding is supplied to the adder 3 and a 
motion prediction/compensation section 12. 

In the motion prediction/compensation section 12, image 
information for reference that corresponds to the image from 
25 the picture sorting . buff er 2 that is subjected to inter- 
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encoding is read out from the frame memory 11 and then 
subjected to motion prediction/compensation processing to 
generate reference image information, which is then supplied 
to the adder 3. Furthermore, motion vector information 
5 obtained as a result of motion prediction/compensation 

processing in the motion prediction/compensation section 12 
is supplied to the lossless encoding section 6. 

In the adder 3, the reference image information from 
the motion prediction/compensation section 12 is converted 

10 to a differential signal from the image information of the 
image from the picture sorting buffer that is subjected to 
inter-encoding . 

When an image which is subjected to inter-encoding is 
to be processed, the differential signal is subjected to 

15 orthogonal transformation in the orthogonal transformation 
section 4, and the obtained transform coefficient is 
supplied to the quantization section 5- In the quantization 
section 5, the transform coefficient supplied from the 
orthogonal transformation section 4 is subjected to 

20 quantization processing under the control of the rate 
control section 8. 

In the lossless encoding section 6, an encoding mode is 
determined based on the transform coefficient and the 
quantization scale quantized by the quantization section 5, 

25 as well as the motion vector information supplied from the 
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motion prediction/compensation section 12 and other 
information. The determined encoding mode is then subjected 
to lossless encoding to generate information to be stored in 
the header of an image encoding unit. The encoded encoding 
5 mode is accumulated in the accumulation buffer 7. The 

encoded encoding mode accumulated in the accumulation buffer 
7 is output as image compression information. 

Furthermore, in the lossless encoding section 6, motion 
vector information from the motion prediction/compensation 

ID section 12 is subjected to lossless encoding processing to 

generate information to be stored in the header of the image 
encoding unit. 

When an image which is subjected to inter-encoding is 
to be processed, the processing in the dequantization 

15 section 9 and the subsequent processing are carried out in 
the same manner as with intra-encoding, and will not be 
described. 

A known image information decoding apparatus which 
receives image compression information output by the known 
20 image information encoding apparatus shown in Fig. 1 to 

restore an image signal will now be described with reference 
to Fig. 2. Fig. 2 shows an example structure of a known 
image information decoding apparatus. 

In the relevant image information decoding apparatus, 
25 image compression information which has been input is 
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temporarily stored in an accumulation buffer 21 and 
transferred to a lossless decoding section 22. The lossless 
decoding section 22 applies lossless decoding (e.g., 
variable-length decoding or arithmetic decoding) to the 
5 image compression information based on a predetermined 
format of image compression information to acquire the 
encoding mode information stored in the header and supplies 
it to a dequantization section 23. The lossless decoding 
section 22 also acquires the quantized transform coefficient 

ID to supply it to the dequantization section 23. Furthermore, 
if the frame to be decoded has been subjected to inter- 
encoding, the lossless decoding section 22 also decodes the 
motion vector information stored in the header of the image 
compression information and supplies the information to a 

15 motion prediction/compensation section 28. 

The dequantization section 23 dequantizes the quantized 
transform coefficient supplied from the lossless decoding 
section 22, and supplies the obtained transform coefficient 
to an inverse orthogonal transformation section 24. The 

20 inverse orthogonal transformation section 24 applies inverse 
orthogonal transformation (e.g., inverse discrete cosine 
transformation or inverse Karhunen-Loeve transform) to the 
transform coefficient based on a predetermined format of the 
image compression information. 

25 If the relevant frame has been subjected to intra- 
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encoding, the image information subjected to inverse 
orthogonal transformation is stored in a picture sorting 
buffer 26 via an adder 25, converted to an analog signal by 
a D/A conversion section 27, and then output to the 
5 subsequent stage. The image information subjected to 

inverse orthogonal transformation is also stored in a frame 
memory 29. 

Furthermore, if the relevant frame has been subjected 
to inter-encoding, a reference image is generated in the 

10 motion prediction/compensation section 28 based on the 

motion vector information from the lossless decoding section 
22 and the image information stored in the frame memory 2 9 
and is then supplied to the adder 25. In the adder 25, the 
reference image from the motion prediction/compensation 

15 section 28 is combined with the output from the inverse 
orthogonal transformation section 25 to generate image 
information. The other processing is carried out in the 
same manner as with a frame subjected to intra-encoding and 
will not be described. 

2 0 According to H.2 6L, two types of encoding: UVLC 

(Universal Variable Length Code) , one type of variable- 
length encoding, and CABAC (Context-based adaptive binary 
arithmetic coding) , one type of arithmetic coding, are 
defined as lossless encoding schemes. Thus, the user can 

25 select one of UVLC and CABAC as a lossless encoding scheme. 
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The information indicating whether the lossless encoding 
scheme used is UVLC or CABAC is specified in the field 
called Entropy Coding included in the RTP Parameter Set 
Packet of the RTP layer in the image compression information. 
5 Arithmetic coding, to which CABAC belongs , will now be 

described. In arithmetic coding, any message (including a 
plurality of alphabetic symbols) is represented as one point 
in a semi-open interval 0.0<x<1.0, and the code is generated 
based on the coordinates of this point. 

10 First, the semi-open interval 0.0<x<1.0 is divided into 

subintervals, each corresponding to a symbol, on the basis 
of the occurrence probabilities of the symbols included in 
the alphabetic sequence. 

Fig. 3 shows an example of the occurrence probabilities 

15 of symbols s 1 to s 7 with their respective subintervals. In 
arithmetic coding, the upper limit and the lower limit of a 
subinterval are determined on the basis of the cumulative 
occurrence probability of each symbol, as shown in Fig. 3. 
The lower limit of the subinterval for the symbol s ± 

20 (i=l, 2, . . . , 7) is equal to the upper limit of the subinterval 
for the preceding symbol s ± _ lf and the upper limit of the 
subinterval for the symbol s ± is equal to the value obtained 
by adding the occurrence probability of the symbol s ± to the 
lower limit of the subinterval for the symbol s ± . 

25 Let us assume that (s 2 s 1 s 3 s 6 s 7 ) is input as a message. 
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Here, the symbol s 7 is assumed to be a terminal symbol which 
represents the end of the message. In short, the message 
ends with this terminal symbol. The arithmetic coding 
scheme calculates a subinterval corresponding to each symbol 
5 included in the message (s 2 s 1 s 3 s 6 s 7 ) , as shown in Fig. 4. In 
other words, the interval assigned as shown in Fig. 3 is 
divided in proportion to the cumulative occurrence 
probability of the subsequent symbol. The subinterval 
obtained finally is the range which includes the value 
10 representing the message. In this manner/ any value in this 
range can uniquely restore the corresponding message. It is 
noted, however, that a value that can be represented by a 
power of two in the semi-open interval is used to represent 
the message, taking the encoding efficiency into 
15 consideration. 

More specifically, in this example, the value obtained 
by Expression (2) shown below represents the message 
included in the semi-open interval 0 . 21164<x<0 . 2117 on the 
basis of Expressions (1) shown below. 
20 2^=0.5 

2- 2 =0.25 

2~ 3 =0.125 

2" 4 =0.0625 

2" 5 =0.03125 
25 2-^=0.015625 
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2~ 7 =0. 0078125 
2- 8 =0.00390625 
2~ 9 =0. 001953125 
2~ 10 =0. 0009765625 
5 2" 1:L =0. 00048828125 

2- 12 =0. 000244140625 

...d) 

10 2- 3 +2- 4 +2" 6 +2- 7 +2- 11 +2- 12 =0. 211669921875 ... (2) 

Thus, a code length of 12 bits is sufficient for the 
length of the code corresponding to the message (s 2 s 1 s 3 s 6 s 7 ) 
so that a value from 2 _1 to 2~ 12 can be represented to encode 
the message (s 2 s 1 s 3 s 6 s 7 ) into (001101100011) . 

15 CABAC defined in H.26L will now be described. Details 

of CABAC are described in a document "Video Compression 
Using Context-Based Adaptive Arithmetic Coding", Marpe et al, 
ICIOl (hereinafter, referred to as Document 1) . CABAC has 
the following three features, compared with UVLC, which is 

20 also defined in H.26L. 

A first feature is a capability of eliminating the 
redundancy between symbols by using a context model 
appropriate for each symbol to be encoded to carry out 
arithmetic coding based on an independent probability model. 

25 A second feature is a capability of assigning a bit 
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rate of a non-integer value to each symbol in arithmetic 
coding, i.e., a capability of achieving an encoding 
efficiency similar to that of entropy. 

For example, statistical data of a motion vector is 
5 variable in space and time, as well as with respect to bit 
rates and sequences. A third feature enables encoding in 
response to such variations to be carried out by applying 
adaptive encoding. 

Fig. 5 shows a typical structure of a CABAC encoder to 

10 which CABAC is applied. In the relevant CABAC encoder, a 

context modeling section 31 first converts the symbol of any 
syntax element in image compression information to an 
appropriate context model according to the history. Such 
modeling is called context modeling. The context model for 

15 each syntax element in image compression information will be 
described below. 

A binarization section 32 binarizes a symbol which is 
not binarized. In an adaptive binary arithmetic coding 
section 33, the binarized symbol is then subjected to 

20 probability estimation by a probability estimation section 
34, and is subjected to adaptive arithmetic coding by an 
encoding engine 35 based on the probability estimation. 
After adaptive arithmetic coding processing has been carried 
out, the related models are updated, and each model can 

25 carry out encoding processing according to the statistics of 
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actual image compression information. 

Here, context models for carrying out arithmetic coding 
of MB_type (MB_type) , motion vector information (MVD) , and 
reference frame parameter (Ref_frame) , which are syntax 
5 elements in image compression information, will now be 
described. 

Context model generation for MB_type will be described 
for each of two cases: a case of intra-frame and a case of 
inter-frame . 

10 If macroblocks A, B, and C are arranged as shown in Fig. 

6 on an intra-frame, the context model ctx__mb_type_intra (C) 
corresponding to the MB_type of the macroblock C is defined 
according to Expression (3) shown below. The mode of a 
macroblock on an intra-frame is Intra4x4 or Intral6xl. 

15 ctx_mb_type_intra (C) =A+B ... (3) 

In Expression (3), A is 0 when the macroblock A is 
Intra4x4 or 1 when the macroblock A is Intral6xl6. 
Similarly, B is 0 when the macroblock B is Intra4x4 or 1 
when the macroblock B is Intral6xl6. Therefore, the context 

20 model ctx_mb__type_intra (C) takes one of 0, 1, and 2. 

If the macroblocks A, B, and C are arranged as shown in 
Fig. 6 on an inter-frame which is a P picture, the context 
model ctx_mb_type_inter (C) corresponding to the MB_type of 
the macroblock C is defined according to Expression (4) 

25 shown below. If the relevant inter-frame is a B picture, 
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the context model ctx__mb_type_inter (C) is defined according 
to Expression (5) shown below. 
ctx_mb_type_inter (C) 

= ( (A==Skip) ?0:l) + ( (B==Skip) ?0:1) ... (4) 

5 ctx_mb_type_inter (C) 

= ( (A==Direct) ?0:l) + ( (B==Direct) ?0:1) ... (5) 

In Expression (4), the operator ( (A==Skip) ?0 : 1 ) 
indicates 0 if the macroblock A is in the Skip mode or 1 if 
the macroblock A is not in the Skip mode. Similarly, the 
10 operator ( (B==Skip) ?0 : 1 ) indicates 0 if the macroblock B is 
in the Skip mode or 1 if the macroblock B is not in the Skip 
mode . 

In Expression (5), the operator ( (A==Direct ) ?0 : 1) 
indicates 0 if the macroblock A is in the Direct mode or 1 
15 if the macroblock A is not in the Direct mode. Similarly, 

the operator ( (B==Direct ) ?0 : 1) indicates 0 if the macroblock 
B is in the Direct mode or 1 if the macroblock B is not in 
the Direct mode. 

Therefore, there are three types of the context model 
20 ctx_mb_type__inter (C) corresponding to the MB_type of the 

macroblock C on an inter-frame (P picture) for each of the P 
picture and the B picture. 

Context model generation for motion vector information 
(MVD) will now be described. 
25 Motion vector information corresponding to the 
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macroblock of interest included in image compression 
information is encoded as prediction errors from the motion 
vector corresponding to the neighboring macroblocks. The 
evaluation function e k (C) for the macroblock C of interest, 
5 from among the macroblocks A, B, and C arranged as shown in 
Fig. 7, is defined according to Expression (6) shown below. 
In Expression (6), k=0 indicates the horizontal component, 
whereas k=l indicates the vertical component. 



prediction errors with respect to the macroblocks A and B, 
respectively, neighboring the macroblock C. 

In Expression (6), if the macroblock C is disposed at 
the left edge of the picture frame, i.e., if one of the 

15 macroblocks A and B does not exist, information related to 
the corresponding motion vector prediction error mvd k (A) or 
mvd k (B) cannot be obtained, and hence the corresponding item 
in the right-hand member of Expression (6) is ignored. The 
context model ctx_mvd(C,k) corresponding to e k (C) defined as 

20 described above is defined according to Expressions (7-1) to 
(7-3) below. 



e k (C) = |mvd k (A) | + |mvd k (B) | 



. . - (6) 



ID 



Here, mvd k (A) and mvd k (B) indicate motion vector 



ctx__mvd (C, k) =0 



e k (C)<3 



. • • (7-1) 



ctx_mvd (C, k) =1 



32<e k (C) 



. - . (7-2) 



ctx mvd (C, k) =2 



3<e k (C)<32 



. . . (7-3) 



25 



Context model generation for the motion vector 
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information (MVD) is carried out as shown in Fig. 8. More 
specifically, the motion vector prediction error mvd k (C) for 
the macroblock C is divided into the absolute value 
|mvd k (C) | and the sign. The absolute value |mvd k (C) | is 
5 binarized. The first bin (the leftmost value) of the 
binarized absolute value |mvd k (C)| is encoded using the 
above-described context model ctx_mvd (C, k) . The second bin 
(the second value from the left) is encoded using context 
model 3. Similarly, the third and fourth bins are encoded 

10 using context models 4 and 5, respectively. The fifth bin 
and the subsequent bins are encoded using context model 6. 
The sign of mvd k (C) is encoded using context model 7. As 
described above, motion vector information (MVD) is encoded 
using eight types of context models. 

15 Context models for encoding the reference frame 

parameter (Ref_frame) will now be described. 

When two or more reference frames are used for an 
inter-frame, information related to the reference frame is 
set for each macroblock of the inter-frame. If the 

20 reference frame parameters for the macroblocks A and B are 
represented as A and B, respectively, with respect to the 
macroblocks A, B, and C arranged as shown in Fig. 6, the 
context model ctx_ref_f rame (C) for the macroblock C is 
defined according to Expression (8) shown below. 

25 ctx_ref_frame(C) = ( (A==0) ?0:l)+2 ( (B==0) ?0:1) ... (8) 
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In Expression (8), the operator ( (A==0 ) ?0 : 1 ) is 0 when 
the reference frame parameter for the macroblock A is 0 or 1 
when the reference frame parameter for the macroblock A is 
not 0. Similarly, the operator ((B==0)?0:1) is 0 when the 
5 reference frame parameter for the macroblock B is 0 or 1 
when the reference frame parameter for the macroblock B is 
not 0. 

Thus, four types of context models for encoding the 

reference frame parameter (Ref_frame) are defined according 
10 to Expression (8). Furthermore, the context model for the 

second bin and the context models for the third bin and the 

subsequent bins are defined. 

Context models for arithmetically encoding the code 

block pattern (CBP) , which is a syntax element related to 
15 the texture information included in the image compression 

information according to H.26L, the intra-prediction mode 

(IPRED), and the (RUN, LEVEL) information will now be 

described. 

The description starts with context models related to 
20 the code block pattern. The handling of code block patterns 
other than an Intral6xl6 macroblock is defined as follows. 

That is, as the CBP bits for the luminance signal, one 
CBP bit is included in each of four 8x8 blocks of an 
Intral6xl6 macroblock, i.e., a total of four CBP bits. When 
25 the macroblocks A, B, and C are arranged as shown in Fig. 6, 
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the context model ctx_cbp_luma (C) corresponding to the 
luminance signal of the macroblock C is defined according to 
Expression (9) shown below. 

ctx_cbp_luma (C)=A+2B - - - (9) 

5 In Expression (9), A indicates the CBP bit of the 

luminance signal of the macroblock A, and B indicates the 
CBP bit of the luminance signal of the macroblock B. 

The remaining two bits in the CBP field are related to 
the chrominance signal. The context model 
10 ctx__cbp_chroma_sig (C) corresponding to the chrominance 
signal of the macroblock C is defined according to 
Expression (10) shown below. 

ctx__cbp_chroma__sig (C) =A+2B ... (10) 

In Expression (10) , A indicates the CBP bit of the 
15 chrominance signal of the macroblock A, and B indicates the 
CBP bit of the chrominance signal of the macroblock B. 

Here, if the context model ctx_cbp_chroma_sig (C) 
corresponding to the chrominance signal of the macroblock C 
is not 0, i.e., if the AC components of the chrominance 
20 signal exist, the context model ctx__cbp_chroma_ac (C) 

corresponding to the AC components of the chrominance signal 
of the macroblock C defined according to Expression (11) 
shown below needs to be encoded. 

ctx_cbp_chroma_ac (C) =A+2B ... (11) 

25 In Expression (11), A indicates the cbp_chroma__ac decision 



- 20 - 



corresponding to the macroblock A, and B indicates the 
cbp_chroma_ac decision corresponding to the macroblock B. 

Since the context models defined according to 
Expressions (9) to (11) are defined separately for the 
5 intra-macroblock and the inter-macroblock, a total of 24 
(=2x3x4) types of context models are defined. 

Furthermore, in the case of an Intral6xl6 macroblock, 
one type of context model is defined for the binarized AC 
decision, and one type of context model is defined for each 
10 component of the chrominance signal. 

Context models related to the intra-prediction mode 
(IPRED) will now be described- Six types of intra- 
prediction modes (label 0 to 5) defined in H.26L will now be 
described with reference to Figs. 9 and 10. Fig. 9 shows 
15 pixels a to p existing in a 4x4 block generated by dividing 
a macroblock and pixels A to I existing in the neighboring 
4x4 blocks. Labels 1 to 5 in Fig. 10 indicate intra- 
prediction modes with different directions. The intra- 
prediction mode indicated by label 0 is a DC prediction mode 
20 (DC Prediction) . 

In the intra-prediction mode of label 0, the pixels a 
to p are predicted according to Expression (12) shown below. 

pixels a to p= (A+B+C+D+E+F+G+H) //8 ...(12) 
In Expressions (12) to (15), A to I indicate the pixels A to 
25 I, respectively, and the symbol "//" means an arithmetic 
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operation such that the result of division is rounded off. 

In the intra-prediction mode indicated by label 0, if 
four pixels (e.g., the pixels A to D) of the eight pixels* A 
to H do not exist in the picture frame, Expression (12) is 
5 not used and the mean value of the remaining four pixels 

(the pixels E to H n this case) is used as predicted values 
for the pixels a to p. Furthermore, if none of the eight 
pixels A to H exists in the picture frame, Expression (12) 
is not used and a predetermined value (e.g., 128) is used as 

10 predicted values of the pixels a to p. 

The intra-prediction mode indicated by label 1 is 
called Vertical/Diagonal Prediction. The intra-prediction 
mode of label 1 is used only when the four pixels A to D 
exist in the picture frame. In this case, the pixels a to p 

15 are predicted according to Expressions (13-1) to (13-6) 
shown below. 

pixel a=(A+B)//2 ...(13-1) 
pixel e=B . . . (13-2) 

pixels b,i=(B+C)//2 ...(13-3) 

20 pixels f,m=C ...(13-4) 

pixels c,j=(C+D)//2 ...(13-5) 
pixels d, g, h, k, 1, n, o, p=D ...(13-6) 
The intra-prediction mode indicated by label 2 is 
called Vertical Prediction. The intra-prediction mode of 

25 label 2 is used only when the four pixels A to D exist in 
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the picture frame. In this case, the pixel A is used as 
predicted values of, for example, the pixels a, e, i, and m, 
and the pixel B is used as predicted values of, for example, 
the pixels b, f, j, and n. 
5 The intra-prediction mode indicated by label 3 is 

called Diagonal Prediction- The intra-prediction mode of 
label 1 is used only when the nine pixels A to I exist in 
the picture frame. In this case, the pixels a to p are 
predicted according to Expressions (14-1) to (13-7) shown 
10 below. 

pixel m=(H+2G+F) //4 ...(14-1) 
pixels i,n= (G+2F+E) //4 ...(14-2) 
pixels e, j , o= (F+2E+I ) //4 ...(14-3) 
pixels a, f , k,p= (E+2I+A) //4 ...(14-4) 
15 pixels b,g,l=(I+2A+B) //4 ...(14-5) 

pixels c,h= (A+2B+C) //4 ...(14-6) 
pixel d=(B+2C+D) //4 ...(14-7) 
The intra-prediction mode indicated by label 4 is 
called Horizontal Prediction. The intra-prediction mode of 
20 label 4 is used only when the four pixels E to H exist in 
the picture frame. In this case, the pixel E is used as 
predicted values of, for example, the pixels a, b, c, and d, 
and the pixel F is used as predicted values of, for example, 
the pixels e, f, g, and h. 
25 The intra-prediction mode indicated by label 5 is 
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called Horizontal/Diagonal Prediction. The intra-prediction 
mode of label 5 is used only when the four pixels E to H 
exist in the picture frame. In this case, the pixels a to p 
are predicted according to Expressions (15-1) to (15-6) 
5 shown below. 

pixel a=(E+F)//2 ...(15-1) 
pixel b=F . . . (15-2) 

pixels c,e=(F+G)//2 ...(15-3) 
pixels f,d=G ...(15-4) 
10 pixels i,g=(G+H}//2 ...(15-5) 

pixels h, j , k, 1, m, n, o, p=H ...(15-6) 
Two context models are defined for each of the intra- 
prediction modes of labels 0 to 5 . More specifically, one 
of the two context models is the first bin for each mode and 
15 the other of the two context models is the second bin for 

each mode. In addition to these context models, one context 
model is defined for each of the two bits in the Intral6xl6 
mode. Therefore, a total of 14 context models are defined 
for the intra-prediction mode. 
20 Context models related to (RUN, LEVEL) will now be 

described. 

In H.26L, two types of scan methods shown in Figs. 11A 
and 11B are defined as methods for rearranging a two- 
dimensional discrete cosine transform coefficient into a 
25 one-dimensional coefficient. The single scan technique 
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shown in Fig. 11A is used for the luminance signal of an 
intra-macroblock in a case other than that where the 
quantization parameter QP is smaller than 24. The double 
scan technique shown in Fig. 11B is used when the single 
5 scan technique is not used. 

In an inter-macroblock and an intra-macroblock with a 
quantization parameter QP of 24 or larger, an average of one 
non-zero coefficient exists for a 4x4 macroblock, in short, 
a one-bit EOB (End Of Block) signal is sufficient. For the 

10 luminance signal of an intra-macroblock with a quantization 
parameter QP smaller than 24, two or more non-zero 
coefficients exist, and a one-bit EOB signal is not 
sufficient. This is the reason that the double scan 
technique shown in Fig. 11B is used. 

15 As shown in Fig. 12, nine types of context models are 

defined for (RUN, LEVEL) according to the discrimination of 
the above-described scan method, the discrimination between 
DC block type and AC block type, the discrimination between 
luminance signal and chrominance signal, and the 

20 discrimination between intra-macroblock and inter-macroblock. 
The LEVEL information is separated into the sign and 
the absolute value. Four context models are defined 
according to the corresponding Ctx_run_level shown in Fig. 
12. More specifically, the first context model is defined 

25 for the sign, the second context model is defined for the 
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first bin, the second context model is defined for the 
second bin, and the fourth context model is defined for the 
subsequent bins. 

When LEVEL is not 0 (i.e., the LEVEL is not an EOB) , 
"5 RUN described below is encoded. For RUN, two context models 
are defined for each Ctx_run__level shown in Fig. 12: one for 
the first bin and the other for the second and subsequent 
bins . 

Context models for the quantization-related parameter 
10 Dquant that can be set at the macroblock level in image 
compression information according to H.26L will now be 
described. 

The parameter Dquant is set when the code block pattern 
for the macroblock includes a non-zero orthogonal transform 

15 coefficient or the macroblock is 16x16 Intra Coded. The 

parameter Dquant can range from -16 to 16. The quantization 
parameter QUANT new for the macroblock is calculated according 
to Expression (16) shown below that uses the parameter 
Dquant in the image compression information. 

20 QUANT new =modulo 32 ( QUANT old + Dquant +32 ) ... (16) 

In Expression (16), QUANT old is the quantization parameter, 
used for the previous encoding or decoding. 

The first context model ctx_d quant (C) for the parameter 
Dquant of the macroblock C arranged as shown in Fig. 6 is 

25 defined according to Expression (17) shown below. 
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ctx__dquant (C) = (A!=0) ... (17) 

In Expression (17) , A indicates the value of the parameter 
Dquant of the macroblock A. The second context model is 
defined for the first bin and the second context model is 

"5 defined for the second and the subsequent bins. 

If a symbol which is input to the context models 
described above is not binarized, the symbol must be 
binarized before it can be input to the context models. 
Syntax elements other than MB_type are binarized according 

ID to the relationships shown in Fig. 13. 

MB_type, ten types of which are defined for the P 
picture, is binarized according to the relationship shown in 
Fig. 14A. Furthermore, MB_type, 17 types of which are 
defined for the B picture, is binarized according to the 

15 relationships shown in Fig. 14B. 

Registers for the above-described various context 
models are pre-initialized with pre-calculated values, and 
when a symbol is to be encoded, the occurrence frequencies 
of the bins for a series of context models are successively 

20 updated for a determination in the encoding of the 
subsequent symbol. 

If the occurrence frequency for a given context model 
exceeds a predetermined value, the frequency counter is 
scaled down. Through such periodic scaling processing, 

25 dynamic occurrence of symbols can be handled easily. 
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For the arithmetic coding scheme for binarized symbols 
in H.26L, the approach disclosed in a document "Arithmetic 
Coding for Data Compression", (Witten et al. Comm. of the 
ACM, 30 (6), 1987, pp520-541) (hereinafter, referred to as 
5 Document 2) is applied, as of this writing. 

In MPEG2, if an image signal to be input is of 
interlaced scan format, field/frame adaptive encoding 
processing can be carried out at the macroblock level. 

Although such specifications are not defined in H.26L 
10 at present, a document "Interlace Coding Tools for H.26L 
Video Coding (L. Wang et al., VCEG-037, Dec. 2001)" 
(hereinafter, referred to as Document 3) proposes that the 
H.26L specifications be extended to support field/frame 
adaptive encoding processing at the macroblock level. 
15 The field/frame adaptive encoding processing at the 

macroblock level proposed in Document 3 will now be 
described. 

According to the current H.2 6L, seven types of modes 
(modes 1 to 7) , as shown in Fig. 15, are defined as units of 
20 motion prediction/compensation in a macroblock. 

Document 3 proposes that a frame/field flag be disposed 
between Run and MB_type as the syntax corresponding to the 
macroblock in image compression information, as shown in Fig. 
16. If the value of the frame/field flag is 0, it indicates 
25 that the relevant macroblock is to be subjected to frame- 
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based encoding. In contrast, if the value of the 
frame/field flag is 1, it indicates that the relevant 
macroblock is to be subjected to field-based encoding. 

If the value of the frame/field flag is 1, i.e., if 
5 field-based encoding is to be applied, the pixels in the 
macroblock are rearranged row by row, as shown in Fig. 17. 

If the value of the frame/field flag is 1, five types 
of modes (modes la to 5a), as shown in Fig. 18, i.e., the 
five types of modes corresponding to the modes 3 to 7 in Fig. 

10 15, are defined as units of motion prediction/compensation 
in the macroblock. 

For example, in the mode 2a of Fig. 18, the blocks 0 
and 1 out of the four 8x8 blocks 0 to 3 generated by 
dividing the macroblock belong to the same field parity, and 

15 the blocks 2 and 3 belong to the same field parity. 

Furthermore, for example, in the mode 3a of Fig. 18, the 
blocks 0 to 3 of the eight 4x8 blocks 0 to 8 generated by 
dividing the macroblock belong to the same field parity, and 
the blocks 4 to 7 belong to the same field parity. 

20 The intra-prediction mode when the value of the 

frame/field flag is 1 will now be described. For example, 
the pixels a to p disposed in the 4x4 block shown in Fig. 9 
are subjected to intra-prediction using the pixels A to I 
disposed in the neighboring 4x4 blocks, also when the value 

25 of the frame/field flag is 1. In this case, it should be 
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noted that all of the pixels a to p and the pixels A to 1 
belong to the same field parity. 

A description when the pixels A to I and the pixels a 
to p belong to the same macroblock will now be given with 
5 reference to Fig. 19. The pixels a to p existing in the 4x4 
block 7 generated by dividing the macroblock into 16 are 
subjected to intra-prediction using the pixels A to I 
disposed at the edges of the neighboring blocks 2, 3, and 6. 
A description when the pixels A to I belong to a 

10 macroblock different from that of the pixels a to p will now 
be given with reference to Figs. 20A and 20B. 

Fig. 20A shows that the frame/field flag values of the 
macroblocks to the left of and above the macroblock for 
processing are 1. In this case, the intra-prediction of the 

15 pixels existing in the 4x4 block C generated by dividing the 
target macroblock into 16 is carried out based on the pixels 
in the 4x4 block A generated by dividing the macroblock to 
the left into 16 and the pixels in the 4x4 block B generated 
by dividing the macroblock above into 16. The intra- 

20 prediction of the pixels existing in the 4x4 block C is 

carried out based on the pixels existing in the 4x4 block A ? 
and the pixels existing in the 4x4 block B 1 . 

Fig. 2 OB shows an example where the value of the 
frame/field flag for the target macroblock for processing is 

25 1 and the values of the frame/field flags for the 
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macroblocks to the left and above are 0. In this case, the 
intra-prediction of the pixels existing in the 4x4 block C 
generated by dividing the target macroblock into 16 is 
carried out based on the pixels in the 4x4 block A generated 

"5 by dividing the macroblock to the left into 16 and the 
pixels in the 4x4 block B generated by dividing the 
macroblock above into 16. The intra-prediction of the 
pixels existing in the 4x4 block C f is carried out based on 
the pixels existing in the 4x4 block A 1 and the pixels 

ID existing in the 4x4 block B ? . 

Intra-prediction of the chrominance signal will now be 
described with reference to Fig. 21. When the value of the 
frame/field flag is 1, only one type of intra-prediction 
mode for the chrominance signal is defined. 

15 A to D in Fig. 21 each represent a 4x4 block of the 

chrominance signal. The blocks A and B belong to the first 
field and the blocks C and D belong to the second field. s 0 
to s 2 are the sum of the chrominance signals existing in the 
blocks which belong to the first field parity and neighbor 

20 the blocks A to D. s 3 to s 5 are the sum of the chrominance 
signals existing in the blocks which belong to the second 
field parity and neighbor the blocks A to D. 

The predicted values A to D respectively corresponding 
to the blocks A to D are predicted according to Expressions 

25 (18) shown below provided that s 0 to s 5 all exist in the 
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picture frame. 

A= (s 0 +s 2 +4) /8 
B=( Sl +2) /4 
C= (s 3 +s 5 +4) /8 

'5 D=(s 4 +2) /4 ... (18) 

If only s 0 , s lf s 3 , and s 4 of s 0 to s 5 exist in the 
picture frame, the predicted values A to D respectively 
corresponding to the blocks A to D are predicted according 
to Expressions (19) shown below. 
10 A=(s 0 +2) /4 

B-(s x +2) /4 
C=(s 3 +2) /4 

D=(s 4 +2) /4 . . . (19) 

Furthermore, if only s 2 and s 5 of s 0 to s 5 exist in the 
15 picture frame, the predicted values corresponding to the 
blocks A to D are predicted according to Expressions (20) 
shown below. 

A=(s 2 +2) /4 
B=(s 2 +2) /4 
20 C=(s 5 +2)/4 

D=(s 5 +2) /4 ... (20) 

Fig. 22 shows a method for encoding the residual 
components of the chrominance signal after intra-prediction 
has been applied as described above. More specifically, 
25 each of the 4x4 blocks is subjected to orthogonal 
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transformation processing, the 2x2 blocks as shown in the 
figure are generated using the DC components of the first 
field and the second field, and orthogonal transformation 
processing is again applied. 
5 Motion prediction/compensation processing when the 

value of the frame/field flag is 1 will now be described. 
When the value of the frame/field flag is 1, there are six 
types of motion prediction/compensation modes: an inter- 
16x16 mode, an inter-8xl6 mode, an inter-8x8 mode, an inter- 
ID 4x8 mode, and an inter-4x4 mode. 

For example, the inter-16xl6 mode is a mode in which 
the motion vector information for the first field, the 
motion vector information for the second field, and the 
reference frame in the inter-8xl6 mode are equivalent. 
15 These six types of motion prediction/compensation modes 

are respectively assigned Code_Numbers 0 to 5. 

In the current H.26L, a multiple-frame prediction for 
allowing a plurality of reference frames as shown in Fig. 23 
to be provided is specified. In the current frame-based 
20 H.26L standard, information related to reference frames is 
defined at the macroblock level such that the previously 
encoded frame is assigned Code_Number 0, and the frames one 
to five times preceding the frame with Code_Number 0 are 
respectively assigned Code_Number 1 to Code_Number 5. 
25 On the other hand, for field-based encoding, the first 
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field of the previously encoded frame is assigned 
Code_Number 0, and the second field of the same frame is 
assigned Code_Number 1. The first field of the frame 
preceding the frame with Code_Number 0 is assigned 
5 Code_Number 2 and the second field of the relevant frame is 
assigned Code_Number 3. The first field of the frame 
preceding the frame with Code_Number 2 is assigned 
Code_Number 4 and the second field of the relevant frame is 
assigned Code_Number 5. 

ID Furthermore , for macroblocks that are subjected to 

field-based encoding, the reference field for the first 
field and the reference field for the second field are 
specified separately from each other. 

The median prediction specified in the current H.26L 

15 will now be described with reference to Fig. 24 , followed by 
the description of a motion vector information prediction 
method when the value of the frame/field flag is 1. The 
16x16, 8x8, or 4x4 motion vector information corresponding 
to the 16x16 macroblock E shown in Fig. 24 is predicted 

20 using the median of the motion vector information of the 
neighboring macroblocks A to C. 

Any of the macroblocks A to C that does not exist in 
the picture frame, however, is assumed to have a motion 
vector information value of 0 for median calculation. If, 

25 for example, the macroblocks D, B, and C do not exist in the 
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picture frame, the motion vector information corresponding 
to the macroblock A is used as the predicted value. 
Furthermore, if the macroblock C does not exist in the 
picture frame, the median is calculated using the motion 
"5 vector information of the macroblock D instead of the 
macroblock C. 

The reference frames for the macroblocks A to D do not 
need to be the same. 

A description when the block size of the macroblock is 

ID 8x16, 16x8, 8x4, or 4x8 will now be given with reference to 
Figs. 25A to 25D. The macroblock E of interest and the 
neighboring macroblocks A to D are assumed to be arranged as 
shown in Fig. 24. 

Fig. 25A shows an example where the block sizes of the 

15 macroblocks El and E2 are 8x16. For the left-hand 

macroblock El, if the neighboring macroblock A to the left 
refers to the same frame as the macroblock El, the motion 
vector information of the macroblock A is used as the 
predicted value. If the neighboring macroblock A to the 

20 left refers to a frame different from that referred to by 

the macroblock El, the above-described median prediction is 
applied. 

For the right-hand macroblock E2, if the neighboring 
macroblock C to the upper right refers to the same frame as 
25 the macroblock E2, the motion vector information of the 
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macroblock C is used as the predicted value. If the 
neighboring macroblock C to the upper right refers to a 
frame different from that referred to by the macroblock E2, 
the above-described median prediction is applied. 

"5 Fig. 25B shows an example where the block sizes of the 

macroblocks El and E2 are 16x8. For the upper macroblock El, 
if the neighboring macroblock B above refers to the same 
frame as the macroblock El, the motion vector information of 
the macroblock B is used as the predicted value. If the 

10 neighboring macroblock B above refers to a frame different 
from that referred to by the macroblock El, the above- 
described median prediction is applied. 

For the lower macroblock E2, if the neighboring 
macroblock A to the left refers to the same frame as the 

15 macroblock E2, the motion vector information of the 
macroblock A is used as the predicted value. If the 
neighboring macroblock A to the left refers to a frame 
different from that referred to by the macroblock E2, the 
above-described median prediction is applied. 

20 Fig. 25C shows an example where the block sizes of the 

macroblocks El to E8 are 8x4. The above-described median 
prediction is applied for the left-hand macroblocks El to E4, 
and the motion vector information of the left-hand 
macroblocks El to E4 is used as the predicted values for the 

25 right-hand macroblocks E5 to E8 . 
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Fig. 25D shows an example where the block sizes of the 
macroblocks El to E8 are 4x8. The above-described median 
prediction is applied for the upper macroblocks El to E4, 
and the motion vector information of the upper macroblocks 
~5 El to E4 is used as the predicted values for the lower 
macroblocks E5 to E8 . 

Also, if the value of the. frame/field flag is 1, the 
horizontal direction component of the motion vector 
information is predicted in compliance with the above- 
ID described method. For the vertical direction component, 
however, a field-based block and a frame-based block are 
mixed, and the following processing is carried out. The 
macroblock E of interest and the neighboring macroblocks A 
to D are assumed to be arranged as shown in Fig. 24. 
15 When the macroblock E is to be subjected to frame-based 

encoding provided that one of the neighboring macroblocks A 
to D has been subjected to field-based encoding, the mean 
value between the vertical direction component of the motion 
vector information for the first field and the vertical 
20 direction component of the motion vector information for the 
second field is multiplied by two, and the result is used as 
an equivalent to the frame-based motion vector information 
for prediction processing. 

When the macroblock E is to be subjected to field-based 
25 encoding provided that one of the neighboring macroblocks A 
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to D has been subjected to frame-based encoding, the 
vertical direction component value of the motion vector 
information is divided by two, and the result is used as an 
equivalent to the field-based motion vector information for 
5 prediction processing. 

According to Document 3, a syntax element necessary for 
field/frame encoding at the macroblock level is added, and 
furthermore, the semantics of a syntax element such as 
motion vector information is changed. Nevertheless, in 

IX) Document 3, no new context model is introduced or an 

existing context model is not updated in response to the 
above-described addition and change. Thus, the information 
provided in Document 3 is not sufficient to carry out 
field/frame encoding at the macroblock level using the CABAC 

15 scheme. 

CABAC is known as a scheme which achieves a higher 
encoding efficiency, though it requires a larger amount of 
arithmetic operation for encoding processing compared with 
UVLC, and therefore it is preferable that CABAC is available 
20 for field/frame encoding at the macroblock level even when 
input image information has an interlaced scan format. 



Disclosure of Invention 

In view of the situation described above, an object of 
25 the present invention is to enable field/frame encoding at 
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the macroblock level to be performed using the CABAC scheme 
even when input image information has an interlaced scan 
format . 

An encoding apparatus according to the present 
5 invention includes lossless encoding means for carrying out 
lossless encoding processing using a context model 
corresponding to a frame/field flag indicating whether the 
encoding processing at the macroblock level is field-based 
or frame-based, a context model corresponding to a syntax 

ID element for carrying out the frame-based encoding processing, 
and a context model corresponding to a syntax element for 
carrying out the field-based encoding processing. 

The context model corresponding to the syntax element 
for carrying out the field-based encoding processing may 

15 include at least one of the context models corresponding to 
an MB__type for an I picture, an MB_type for a P/B picture, 
motion vector information, a reference field parameter, and 
an intra-prediction mode. 

An encoding method according to the present invention 

20 includes a lossless encoding step of carrying out lossless 

encoding processing using a context model corresponding to a 
frame/field flag indicating whether the encoding processing 
at the macroblock level is field-based or frame-based, a 
context model corresponding to a syntax element for carrying 

25 out the frame-based encoding processing, and a context model 
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corresponding to a syntax element for carrying out the 
field-based encoding processing. 

A program on a first recording medium according to the 
present invention includes a lossless encoding step of 

"5 carrying out lossless encoding processing using a context 

model corresponding to a frame/field flag indicating whether 
the encoding processing at the macroblock level is field- 
based or frame-based, a context model corresponding to a 
syntax element for carrying out the frame-based encoding 

ID processing, and a context model corresponding to a syntax 

element for carrying out the field-based encoding processing. 

A first program according to the present invention 
enables a computer to execute a lossless encoding step of 
carrying out lossless encoding processing using a context 

15 model corresponding to a frame/field flag indicating whether 
the encoding processing at the macroblock level is field- 
based or frame-based, a context model corresponding to a 
syntax element for carrying out the frame-based encoding 
processing, and a context model corresponding to a syntax 

20 element for carrying out the field-based encoding processing. 

A decoding apparatus according to the present invention 
includes decoding means for decoding image compression 
information that is encoded using a context model 
corresponding to a frame/field flag indicating whether the 

25 encoding processing at the macroblock level is field-based 
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or frame-based, a context model corresponding to a syntax 
element for carrying out the frame-based encoding processing, 
and a context model corresponding to a syntax element for 
carrying out the field-based encoding processing. 

"5 A decoding method according to the present invention 

includes a decoding step of decoding image compression 
information that is encoded using a context model 
corresponding to a frame/field flag indicating whether the 
encoding processing at the macroblock level is field-based 

ID or frame-based, a context model corresponding to a syntax 

element for carrying out the frame-based encoding processing, 
and a context model corresponding to a syntax element for 
carrying out the field-based encoding processing. 

A program on a second recording medium according to the 

15 present invention includes a decoding step of decoding image 
compression information that is encoded using a context 
model corresponding to a frame/field flag indicating whether 
the encoding processing at the macroblock level is field- 
based or frame-based, a context model corresponding to a 

2 0 syntax element for carrying out the frame-based encoding 
processing, and a context model corresponding to a syntax 
element for carrying out the field-based encoding processing. 

A second program according to the present invention 
enables a computer to execute a decoding step of decoding 

25 image compression information that is encoded using a 
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context model corresponding to a frame/field flag indicating 
whether the encoding processing at the macroblock level is 
field-based or frame-based, a context model corresponding to 
a syntax element for carrying out the frame-based encoding 

"5 processing, and a context model corresponding to a syntax 

element for carrying out the field-based encoding processing. 

In the encoding apparatus, encoding method, and first 
program according to the present invention, lossless 
encoding is carried out using a context model corresponding 

ID to a frame/field flag indicating whether the encoding 

processing at the macroblock level is field-based or frame- 
based, a context model corresponding to a syntax element for 
carrying out the frame-based encoding processing, and a 
context model corresponding to a syntax element for carrying 

15 out the field-based encoding processing. 

In the decoding apparatus, decoding method, and second 
program according to the present invention, image 
compression information that is encoded using a context 
model corresponding to a frame/field flag indicating whether 

20 the encoding processing at the macroblock level is field- 
based or frame-based, a context model corresponding to a 
syntax element for carrying out the frame-based encoding 
processing, and a context model corresponding to a syntax 
element for carrying out the field-based encoding processing 

25 is decoded. 
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The encoding apparatus and the decoding apparatus may 
be apparatuses independent of each other or may be a block 
for carrying out encoding and decoding in a signal 
processing apparatus . 

"5 

Brief Description of the Drawings 

Fig. 1 is a block diagram showing the structure of a 

known image information encoding apparatus for carrying out 

image compression by orthogonal transformation and motion 
ID compensation- 
Fig. 2 is a block diagram showing the structure of an 

image information decoding apparatus corresponding to the 

image information encoding apparatus in Fig. 1. 

Fig. 3 is a diagram showing an example of the 
15 relationship between the occurrence probabilities of symbols 

and their respective subintervals in arithmetic coding. 

Fig. 4 is a diagram showing an example of arithmetic 

coding. 

Fig. 5 is a block diagram showing a typical structure 
20 of a CABAC encoder. 

Fig. 6 is a diagram illustrating a context model for 
MB_type . 

Fig. 7 is a diagram illustrating a context model for 
motion vector information MVD. 
25 Fig. 8 is a diagram illustrating the encoding of motion 
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vector information MVD based on a context model. 

Fig. 9 is a diagram illustrating an intra-prediction 
mode defined in H.2 6L. 

Fig. 10 is a diagram illustrating the directions of the 
"5 intra-prediction modes indicated by labels 1 to 5 . 

Fig. 11A is a diagram illustrating the single scan 
technique defined in H.2 6L. 

Fig. 11B is a diagram illustrating the double scan 
technique defined in H.2 6L. 
ID Fig. 12 is a diagram showing a context model 

corresponding to (RUN, LEVEL) defined in H.26L. 

Fig. 13 is a diagram illustrating the binarization of 
syntax elements other than MB_type in H.2 6L. 

Fig. 14A is a diagram illustrating the binarization of 
15 MB_type of the P picture in H.2 6L. 

Fig. 14B is a diagram illustrating the binarization of 
MB_type of the B picture in H.2 6L. 

Fig. 15 is a diagram showing seven types of modes 
defined in H.26L as a unit of motion prediction/compensation 
20 in a macroblock. 

Fig. 16 is a diagram showing syntax for image 
compression information extended such that field/frame 
adaptive encoding can be carried out at the macroblock level. 

Fig. 17 is a diagram illustrating the rearrangement of 
25 pixels of a macroblock when the macroblock is subjected to 
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field-based encoding. 

• Fig. 18 is a diagram showing five types of modes 
defined as a unit of motion prediction/compensation when a 
macroblock is subjected to field-based encoding. 
5 Fig. 19 is a diagram illustrating the operating 

principle for intra-prediction in a macroblock when the 
macroblock is subjected to field-based encoding. 

Fig. 20A is a diagram illustrating the operating 
principle for intra-prediction across macroblocks when a 
10 macroblock is subjected to field-based encoding. 

Fig. 20B is a diagram illustrating the operating 
principle for intra-prediction across macroblocks when a 
macroblock is subjected to field-based encoding. 

Fig. 21 is a diagram illustrating the operating 
15 principle for intra-prediction for the chrominance signal 
when a macroblock is subjected to field-based encoding. 

Fig. 22 is a diagram illustrating the operating 
principle for encoding the residual components of the 
chrominance signal when a macroblock is subjected to field- 
20 based encoding. 

Fig. 23 is a diagram illustrating a multiple-frame 
prediction specified in H.26L. 

Fig. 24 is a diagram illustrating a method for 
predicting motion vector information when a macroblock is 
25 subjected to field-based encoding. 
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Fig, 25A is a diagram illustrating the generation of 
predicted values of motion vector information in prediction 
modes specified in H.26L. 

Fig. 25B is a diagram illustrating the generation of 
5 predicted values of motion vector information in prediction 
modes specified in H.26L. 

Fig. 25C is a diagram illustrating the generation of 
predicted values of motion vector information in prediction 
modes specified in H.26L. 
10 Fig. 25D is a diagram illustrating the generation of 

predicted values of motion vector information in prediction 
modes specified in H.2 6L. 

Fig. 26 is a block diagram showing an example structure 
of an image information encoding apparatus according to an 
15 embodiment of the present invention. 

Fig. 27 is a block diagram showing an example structure 
of the arithmetic coding section 58 in Fig. 26. 

Fig. 28A is a diagram showing a table for binarizing 
the MB__type of a macroblock belonging to a P picture when 
20 the macroblock is subjected to field-based encoding. 

Fig. 28B is a diagram showing a table for binarizing 
the MB_type of a macroblock belonging to a B picture when 
the macroblock is subjected to field-based encoding. 

Fig. 2 9 is a block diagram showing an example structure 
25 of an image information decoding apparatus according to an 



- 46 - 



embodiment of the present invention, the decoding apparatus 
corresponding to the image information encoding apparatus in 
Fig. 26. 

"5 Best Mode for Carrying Out the Invention 

An image information encoding apparatus to which the 
present invention is applied will now be described with 
reference to Fig. 26. The relevant image information 
encoding apparatus enables encoding to be performed using 

ID the CABAC scheme even when input image information has an 
interlaced scan format. 

.In the relevant image information encoding apparatus, 
an A/D conversion section 51 converts an input image signal 
as an analog signal to a digital signal and outputs it to a 

15 picture sorting, buffer 52. The picture sorting buffer 52 
rearranges the input image information from the A/D 
conversion section 51 according to the GOP structure of the 
image compression information which is output from the 
relevant image information encoding apparatus and outputs it 

20 to an adder 54. 

A field/frame determination section 53 determines which 
of frame-based encoding and field-based encoding provides a 
higher encoding efficiency to encode the macroblock of the 
image to be processed, generates the appropriate frame/field 

25 flag, and outputs the result to a field/frame conversion 
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section 55 and an arithmetic coding section 58. 

When the macroblock to be processed is subjected to 
inter-encoding, the adder 54 generates a differential image 
between the input image via the field/frame determination 

"5 section 53 and the reference image from a motion 

prediction/compensation section 64 , and outputs the 
differential image to the field/frame conversion section 55 
and to an orthogonal transformation section 56. On the 
other hand, when the macroblock to be processed is subjected 

ID to intra-encoding, the adder 54 outputs the input image via 
the field/frame determination section 53 as-is to the 
field/frame conversion section 55 and to the orthogonal 
transformation section 56. 

When the macroblock to be processed is subjected to 

15 field-based encoding, the field/frame conversion section 55 
converts the input image from the adder 54 into a field 
structure, and outputs the result to the orthogonal 
transformation section 56. The orthogonal transformation 
section 56 applies orthogonal transformation (e.g., discrete 

20 cosine transformation or Karhunen-Loeve transform) to the 

input image information, and supplies the obtained transform 
coefficient to a quantization section 57. The quantization 
section 57 applies quantization processing to the transform 
coefficient supplied from the orthogonal transformation 

25 section 56 under the control of the rate control section 65. 
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The arithmetic coding section 58 arithmetically encodes 
each syntax element input from the quantization section 57 
and the motion prediction/compensation section 64, as well 
as the frame/field flag from the field/frame determination 

"5 section 53, based on the CABAC scheme, and supplies the 

results to an accumulation buffer 59 for accumulation. The 
accumulation buffer 5 9 outputs the accumulated image 
compression information to the subsequent stage. 

A dequantization section 60 dequantizes the quantized 

ID orthogonal transform coefficient and outputs it to an 

inverse orthogonal transformation section 61. The inverse 
orthogonal transformation section 61 applies inverse 
orthogonal transformation processing to the dequantized 
transform coefficient, generates decoded image information, 

15 and supplies it to a frame memory 62 for accumulation. When 
the macroblock to be processed is subjected to field-based 
encoding, a field/frame conversion section 63 converts the 
decoded image information accumulated in the frame memory 62 
into a field structure, and outputs it to the motion 

20 prediction/compensation section 64. 

The motion prediction/compensation section 64 generates 
the optimal prediction mode information and the motion 
vector information through motion prediction processing and 
outputs it to the arithmetic coding section 58. Furthermore, 

25 the motion prediction/compensation section 64 generates a 
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predicted image to output it to the adder 54. A rate 
control section 65 performs feedback control of the 
operation of the quantization section 57 based on the amount 
of data accumulated in the accumulation buffer 59. A 
5 control section 66 controls each section of the relevant 

image information encoding apparatus according to a control 
program recorded on a recording medium 67 . 

The operating principle of the arithmetic coding 
section 58 will now be described with reference to Fig. 27. 

10 Fig. 27 shows an example structure of the arithmetic coding 
section 58. From among the syntax elements of the input 
image compression information, the frame/field flag shown in 
Fig. 16 is first encoded by a frame/field flag context model 
91 in the arithmetic coding section 58. 

15 When the macroblock to be processed is subjected to 

frame-based encoding, a frame-based context model 92, 
specified in the current H.26L standard, is applied. For 
syntax elements having a non-binarized value, such a value 
is binarized by a binarization section 93 and arithmetic 

20 coding is then applied. 

On the other hand, when the macroblock to be processed 
is subjected to field encoding, a field-based context model 
94 is applied for the syntax elements described below. For 
syntax elements having a non-binarized value, such a value 

25 is binarized by a binarization section 95 and arithmetic 
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coding is then applied. More specif ically, the first syntax 
element is MB_type for the I picture, the second syntax 
element is MB_type for the P/B picture, the third syntax 
element is motion vector information, the fourth syntax 
5 element is a reference field parameter, and the fifth syntax 
is an intra-prediction mode. 

The following description assumes that the macroblocks 
A, B, and C are arranged as shown in Fig. 6. Context models 
related to the frame/field flag will now be described. The 

10 context model ctx_f if r_f lag (C) related to the frame/field 

flag of the macroblock C is defined according to Expression 
(21) shown below. 

ctx_fifr_flag(C)=a+2b . . . (21) 

In Expression (21), a and b are the values of the 

15 frame/field flags of the macroblocks A and B, respectively. 

Context models related to MB_type for the I picture 
will now be described. When the frame/field flag is 1, the 
context model ctx_mb_type_intra_f ield (C) corresponding to 
the MB__type of the macroblock C included in the I picture is 

20 defined according to Expression (22) shown below, as with 
Expression (3) . 

ctx_mb__type_intra_f ield (C) =A+B . . . (22) 

A and B in Expression (22) are the same as the respective 
counterparts in Expression (3) . It does not matter whether 

25 the neighboring macroblocks A and B are subjected to field- 
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based encoding or frame-based encoding- 
Context models related to the MB_type for the P/B 
picture will now be described. When the macroblock C is 
included in the P picture, the context model 
"5 ctx_mb_type_inter_f ield (C) corresponding to the MB_type of 
the macroblock C is defined according to Expression (23) 
shown below. Furthermore, when the macroblock C is included 
in the B picture, the context model 

ctx_mb_type_inter_f ield (C) corresponding to the MB_type of 
ID the macroblock C is defined according to Expression (24) 
shown below. 

ctx_mb_type__inter_f ield (C) 

= ( (A==skip) ?0:1) +2 ( (B==skip) ?0:1) ... (23) 

ctx_mb_type_inter_f ield (C) 
15 = ( (A==Direct) ?0: 1) +2 ( (B==Direct) ?0: 1) ... (24) 

The operators ( (A==skip) ?0 : 1) and ( (A==skip) ?0 : 1 ) in 
Expression (23) are the same as those in Expression (4) and 
the operators ( (A==Direct ) ?0 : 1 ) and ( (B==Direct ) ?0 : 1 ) in 
Expression (24) are the same as those in Expression (5). It 
20 does not matter whether the neighboring macroblocks A and B 
are subjected to field-based encoding or frame-based 
encoding. 

The MB_type of a non-binarized P picture is binarized 
according to the table shown in Fig. 28A. Furthermore, the 
25 MB_type of a non-binarized B picture is binarized according 
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to the table shown in Fig. 28B. 

In an adaptive binary arithmetic coding section 96, the 
binarized symbol is subjected to probability estimation by a 
probability estimation section 97 , and is subjected to 

'5 adaptive arithmetic coding based on probability estimation 
by the encoding engine 98. The related models are updated 
after the adaptive arithmetic coding processing. This 
enables each model to carry out encoding processing 
according to the statistics of actual image compression 

1.0 information. 

For a macroblock that is subjected to frame-based 
encoding, ten types of MB_type are defined if the macroblock 
belongs to the P picture. On the other hand, for a 
macroblock that is subjected to field-based encoding, the 

15 16x16 mode and the 8x16 mode of the above-described 16 types 
of models are not defined if the macroblock belongs to the P 
picture. In short, eight types of MB_type are defined for a 
P-picture-related macroblock that is subjected to field- 
based encoding. 

20 Eighteen types of MB_type are defined for a B-picture- 

related macroblock that is subjected to frame-based encoding. 
On the other hand, for a macroblock that is subjected to 
field-based encoding and belongs to the B picture, the 
forward 16x16 mode, backward 16x16 mode, forward 8x16 mode, 

25 and backward 8x16 mode from among the above-described 18 



- 53 - 



types of modes are not defined. In short, for a B-picture- 
related macroblock that is subjected to field-based encoding, 
14 types of MB_type are defined. 

Context models for motion vector information will now 
"5 be described. When the value of the frame/field flag is 1, 
the first to third context models ctx_mvd_f ield (C, k) 
corresponding to the motion vector information of the 
macroblock C are defined according to Expressions (25-1) to 
(25-3) shown below. 
XO ctx_mvd_field(C,k)=0 e k (C)<3 . . . (25-1) 

ctx_mvd_field(C, k)=l 32<e k (C) . . . (25-2) 

ctx_mvd_field(C,k)=2 3<e k (C)<32 . . . (25-3) 

In Expressions (25-1) to (25-3), the evaluation function e k 
is defined according to Expression (26) shown below. The 
15 macroblocks A and B exist in the same parity field. 

e k (C) = |mvd k (A) | + |mvd k (B) | ... (26) 

If the macroblock A has been subjected to frame-based 
encoding, mvd x fi e id( A ) calculated from Expression (27) shown 
below is applied to Expression (26) for the motion vector 
20 information mvd x (A) for the vertical direction component. 
This is also applicable when the macroblock B has been 
subjected to frame-based encoding. 

mvd i_fieid ( A ) =mv di_ fraitie (A)/2 ...(27) 
In contrast, if the macroblock C is subjected to frame- 
25 based encoding and the neighboring block A has been 
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subjected to field-based encoding, mvd k f rame (A) calculated 
from Expressions (28-1) and (28-2) is applied to Expression 
(26) respectively for the horizontal direction component and 
the vertical direction component of mvd k (A). 
"5 mv d 0 _ frame (A) 

= (mvd 0 _ top (A)+mvd 0 _ bottom (A) ) /2 ... (28-1) 

mvd l_frame( A ) 

=mvd 1 _ top (A) +mvd 1 _ bottom (A) ... (28-2) 

Context models related to the reference field parameter 
ID will now be described. When the value of the frame/field 
flag is 1, the first context model ctx_ref_f ield_top (C) 
corresponding to the first field is defined according to 
Expression (29-1) shown below. Furthermore, the first 
context model ctx_ref_f ield_bot (C) corresponding to the 
15 second field is defined according to Expression (29-2) shown 
below. 

ctx_ref_f ield_top (C) =a t +2b t . . . (29-1) 

ctx_ref_field_bot (C)=a b +2b b . . . (2 9-2) 

In Expressions (29-1) to (29-2), the parameter a t is 
20 related to the first field of the neighboring macroblock A, 
the parameter a b is related to the second field of the 
neighboring macroblock A, the parameter b t is related to the 
first field of the neighboring macroblock B, and the 
parameter b b is related to the second field of the 
25 neighboring macroblock B, as defined in Expressions (30-1) 
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and (30-2) shown below. 
a t , a b ,b t , b b =0 

(when the reference field is the immediate previous 
encoded field) . . . (30-1) 

"5 a t , a b ,b t ,b b =l (otherwise) ...(30-2) 

Context models corresponding to the second bin and the 
subsequent bins are each defined in the same manner as with 
the context model ctx_ref_f rame (C) shown in Expression (8). 
It is noted, however, that the Code_number to be encoded is 

ID not assigned to a frame but to a field. 

Context models related to an intra-prediction mode will 
now be described. When the value of the frame/field flag is 
1, the context model ctx_intra_pred_f ield (C) related to the 
intra-prediction mode corresponding to the macroblock C is 

15 defined in the same manner as with the context model 

ctx_intra_pred (C) for the macroblock in the frame mode. It 
does not matter whether the neighboring macroblocks A and B 
are subjected to field-based encoding or to frame-based 
encoding . 

20 As described above, field/frame encoding using the 

CABAC scheme is made possible by introducing new context 
models and changing existing context models. 

Fig. 2 9 shows an example structure of an image 
information decoding apparatus corresponding to the image 

25 information encoding apparatus in Fig. 26. 
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In the relevant image information decoding apparatus, 
an accumulation buffer 101 accumulates input image 
compression information and outputs it to an arithmetic 
decoding section 102 , as required. The arithmetic decoding 

"5 section 102 applies arithmetic decoding processing to the 
image compression information encoded based on the CABAC 
scheme, outputs the decoded frame/field flag to field/frame 
conversion sections 105 and 110, outputs the quantized 
orthogonal transform coefficient to a dequantization section 

ID 103, and outputs the prediction mode information and the 
motion vector information to a motion 
prediction/compensation section 111 . 

The dequantization section 103 dequantizes the 
quantized ' orthogonal transform coefficient decoded by the 

15 arithmetic decoding section 102. An inverse orthogonal 
transformation section 104 applies inverse orthogonal 
transformation to the dequantized orthogonal transform 
coefficient. If the macroblock to be processed has been 
subjected to field-based encoding, the field/frame 

20 conversion section 105 converts the output image or 
differential image obtained as a result of inverse 
orthogonal transformation into a frame structure. 

If the macroblock to be processed is an inter- 
macroblock, an adder 106 combines the differential image 

25 from the inverse orthogonal transformation section 104 and 



- 57 - 



the reference image from the motion prediction/compensation 
section 111 to generate an output image. A picture sorting 
buffer 107 rearranges the output images according to the GOP 
structure of the input image compression information and 

"5 outputs it to a D/A conversion section 108. The D/A 
conversion section 108 converts the output image as a 
digital signal into an analog signal and outputs it to the 
subsequent stage. 

A frame memory 10 9 stores the image information 

1-0 generated by the adder 106, i.e., the image information from 
which a reference image is generated. When the macroblock 
to be processed has been subjected to field-based encoding, 
the field/frame conversion section 110 converts the image 
information stored in the frame memory 111 into a field 

15 structure. The motion prediction/compensation section 111 
generates a reference image from the image information 
stored in the frame memory based on the prediction mode 
information and the motion vector information for each 
macroblock included in the image compression information, 

20 and outputs the reference image to the adder 106. 

According to the image information decoding apparatus 
constructed as described above, image compression 
information output by the image information encoding 
apparatus in Fig. 26 can be decoded into the original image 

25 information. 



- 58 - 



The sequence of processing described above can be 
implemented using not only hardware but also software. If 
the sequence of processing is to be implemented using 
software, a program constituting the software is installed 

"5 from, for example, a recording medium 67 in Fig. 2 6 to a 

computer built into dedicated hardware or to, for example, a 
general-purpose personal computer that requires programs to 
be installed to carry out the corresponding functions. 
The recording medium 67 may be a package medium 

10 including a magnetic disk (including a flexible disk) ; an 
optical disk (including a compact disc-read only memory, 
i.e., CD-ROM and a digital versatile disk, i.e., DVD); a 
magneto-optical disk (including a mini-disc, i.e., MD) ; or a 
semiconductor memory if such a program is supplied 

15 separately from a user's computer. The recording medium may 
be a ROM or a hard disk of a user's computer if the program 
on the recording medium is supplied preinstalled on the 
user's computer. 

In the present invention, the steps of programs 

20 recorded on the recording medium may or may not be followed 
time-sequentially in order of the described steps. 
Furthermore, the steps may be followed in parallel or 
independently from one another. 

25 Industrial Applicability 
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As described above, according to the present invention, 
field/frame encoding using the CABAC scheme can be carried 
out even when input image information has an interlaced scan 
format . 

~5 Furthermore, according to the present invention, it is 

possible to restore image information in an interlaced scan 
format by decoding compression image information having 
image information of interlaced scan format subjected to 
field/frame encoding using the CABAC scheme at the 

1-0 macroblock level. 



